Error Tracking

What You Will Learn

Why logging exceptions is not the same as tracking errors
How to configure the Sentry Python SDK for a FastAPI service
How to enrich every error with user context, request metadata, and feature flags
How to use breadcrumbs to reconstruct the events leading up to an error
How to write before_send hooks to filter and enrich errors programmatically
How to group related errors using custom fingerprints
How to connect error tracking to your release pipeline for regression detection
How to build an error triage workflow that actually resolves bugs

Prerequisites

Requirement	Details
Python 3.11+	Type hints throughout
FastAPI	All examples use FastAPI
`sentry-sdk[fastapi]`	`pip install "sentry-sdk[fastapi]"`
Lessons 01–03 complete	Logging context and correlation IDs assumed

The Incident: Six Weeks, One Error, Zero Resolution

A production Python service has been logging this message for six weeks:

ERROR:app.services.classifier:Something went wrong

Ten thousand times a day. Nobody knows what it is. Nobody has resolved it. Here is why:

The log has no stack trace - the developer wrote logger.error("Something went wrong") without exc_info=True
There is no user context - you cannot tell if it affects one user or all users
There is no grouping - you cannot tell if it is one bug or fifty different bugs producing the same message
There is no alert - it has been happening since before anyone set up alerting
There is no assignment - nobody owns it

Six weeks later, a new developer finds the code:

try:
    result = self.model.classify(text)
except Exception as e:
    logger.error("Something went wrong")  # The bug
    return {"category": "unknown"}

The exception is a KeyError when the model returns an unexpected category. It has been silently corrupting classification results for six weeks. Revenue impact: unmeasured, probably significant.

Error tracking is not logging. It is a discipline of capturing, grouping, alerting on, and owning every exception your service raises.

1. Sentry Python SDK Setup

Sentry captures exceptions with full context: stack trace, local variables, request data, user information, and environment metadata. It groups similar exceptions together, shows you their frequency and trend, and alerts you when new errors appear.

Installation

# Core SDK + FastAPI integration
pip install "sentry-sdk[fastapi]"

# For SQLAlchemy query capture in error context
pip install "sentry-sdk[sqlalchemy]"

Basic Initialisation

# app/sentry_config.py
import sentry_sdk
from sentry_sdk.integrations.fastapi import FastApiIntegration
from sentry_sdk.integrations.starlette import StarletteIntegration
from sentry_sdk.integrations.sqlalchemy import SqlalchemyIntegration
from sentry_sdk.integrations.logging import LoggingIntegration
import logging

def setup_sentry(
    dsn: str,
    environment: str,
    release: str,
    traces_sample_rate: float = 0.1,
    profiles_sample_rate: float = 0.0,
) -> None:
    """
    Initialise Sentry error tracking.

    Args:
        dsn: Sentry project DSN from your Sentry project settings
        environment: "production", "staging", "development"
        release: Version string, typically "{app}@{git_sha}"
        traces_sample_rate: Fraction of transactions to profile (0.0–1.0)
        profiles_sample_rate: Fraction of sampled transactions to profile
    """
    # Capture WARNING-level and above log messages as breadcrumbs
    # Capture ERROR-level and above as Sentry events (exceptions)
    logging_integration = LoggingIntegration(
        level=logging.WARNING,      # Breadcrumb level
        event_level=logging.ERROR,  # Event level (creates a Sentry issue)
    )

    sentry_sdk.init(
        dsn=dsn,
        environment=environment,
        release=release,

        # Integrations: auto-capture errors from these frameworks
        integrations=[
            StarletteIntegration(transaction_style="endpoint"),
            FastApiIntegration(transaction_style="endpoint"),
            SqlalchemyIntegration(),
            logging_integration,
        ],

        # Performance monitoring (APM)
        traces_sample_rate=traces_sample_rate,
        profiles_sample_rate=profiles_sample_rate,

        # Error filtering and enrichment
        before_send=before_send_hook,
        before_send_transaction=before_send_transaction_hook,

        # Attach request data (careful with PII - see before_send)
        send_default_pii=False,  # Don't send cookies, IP, email by default

        # Capture 10 server variables in the error context
        max_request_body_size="medium",

        # Ignore health checks and metrics noise
        traces_sampler=custom_traces_sampler,
    )

Custom Traces Sampler

def custom_traces_sampler(sampling_context: dict) -> float:
    """
    Determine the sample rate for each transaction individually.
    This gives us more control than a flat `traces_sample_rate`.
    """
    transaction_name = sampling_context.get("transaction_context", {}).get("name", "")

    # Never trace health checks or metrics endpoints
    if any(path in transaction_name for path in ["/health", "/metrics", "/liveness", "/readiness"]):
        return 0.0

    # Always trace errors (Sentry will capture them anyway, but this enriches with spans)
    if sampling_context.get("parent_sampled") is True:
        return 1.0

    # Trace 10% of normal requests
    return 0.10

FastAPI Integration

# app/main.py
import os
from fastapi import FastAPI
from app.sentry_config import setup_sentry

app = FastAPI()

# Sentry must be initialised before the app handles any requests
# Use git_sha injected at build time via environment variable
release = f"document-api@{os.environ.get('GIT_SHA', 'unknown')}"

setup_sentry(
    dsn=os.environ["SENTRY_DSN"],
    environment=os.environ.get("ENVIRONMENT", "development"),
    release=release,
    traces_sample_rate=0.1,
)

# The FastApiIntegration automatically captures exceptions from all routes.
# You do not need SentryAsgiMiddleware separately when using the integration.

What a Captured Exception Looks Like in Sentry

When an unhandled exception reaches Sentry, it creates an Issue with:

Title:   KeyError: 'unknown_category'
Culprit: app.services.classifier in classify

Stack Trace:
  File "app/api/routes/documents.py", line 87, in classify_document
    result = await classifier.classify(text)

  File "app/services/classifier.py", line 43, in classify
    return CATEGORY_MAP[raw_label]  ← KeyError here

  File "app/services/classifier.py", line 43, in classify
    raw_label = self.model.predict(text)["label"]

Local Variables at Error Frame:
  text          = "Quarterly earnings report for Q4..."
  raw_label     = "business_finance"  ← Not in CATEGORY_MAP
  CATEGORY_MAP  = {"technology": ..., "science": ..., "sports": ...}

Request:
  Method: POST
  URL:    /api/documents/classify
  Body:   {"text": "Quarterly earnings report..."}

Tags:
  environment:  production
  release:      document-api@a3b8f2c

Events: 10,247 times in last 7 days
Users affected: 3,891
First seen: 2026-01-19

In six lines of Sentry output, you know: exactly which exception, which line of code, which input caused it, how many users are affected, and when it started. The six-week mystery resolves in 30 seconds.

2. Enriching Error Context

Sentry captures the stack trace automatically. You need to add the application context that makes the error actionable.

Setting User Context

# app/middleware/sentry_context.py
import sentry_sdk
from fastapi import Request
from starlette.middleware.base import BaseHTTPMiddleware

class SentryContextMiddleware(BaseHTTPMiddleware):
    """
    Enriches every Sentry event with user and request metadata.
    Must run after your authentication middleware.
    """

    async def dispatch(self, request: Request, call_next):
        # Set user context - appears in every Sentry event during this request
        user = getattr(request.state, "user", None)
        if user:
            sentry_sdk.set_user({
                "id": user.id,
                "email": user.email,      # Only if not PII-sensitive in your region
                "username": user.username,
                "subscription_tier": user.subscription_tier,
                "organisation_id": user.organisation_id,
            })

        # Set request-level tags (low cardinality, appear in Sentry filters)
        sentry_sdk.set_tag("request.id", request.headers.get("X-Request-ID", "unknown"))
        sentry_sdk.set_tag("api.version", request.headers.get("X-API-Version", "v1"))

        # Set context blocks (rich structured data visible in the error detail)
        sentry_sdk.set_context("request_metadata", {
            "client_ip": request.client.host if request.client else "unknown",
            "user_agent": request.headers.get("user-agent", ""),
            "content_type": request.headers.get("content-type", ""),
            "request_id": request.headers.get("X-Request-ID", ""),
        })

        return await call_next(request)

Programmatic Context in Route Handlers

# app/api/routes/documents.py
import sentry_sdk
from fastapi import APIRouter, UploadFile, Depends

router = APIRouter()

@router.post("/api/documents/classify")
async def classify_document(
    file: UploadFile,
    current_user: User = Depends(get_current_user),
):
    content = await file.read()

    # Add feature flags to Sentry context
    sentry_sdk.set_context("feature_flags", {
        "new_classifier_v2": get_feature_flag("new_classifier_v2", current_user),
        "async_processing": get_feature_flag("async_processing", current_user),
    })

    # Add business-level extra data
    sentry_sdk.set_extra("document_metadata", {
        "filename": file.filename,
        "size_bytes": len(content),
        "content_type": file.content_type,
    })

    try:
        result = await classifier.classify(content)
        return result
    except ClassificationError as exc:
        # Capture with additional context at the point of failure
        with sentry_sdk.push_scope() as scope:
            scope.set_tag("error.category", "classification")
            scope.set_extra("raw_model_output", exc.raw_output)
            scope.set_extra("model_version", exc.model_version)
            sentry_sdk.capture_exception(exc)
        raise HTTPException(status_code=422, detail="Classification failed")

Sentry Context Types

Method	Purpose	Visible in Sentry
`sentry_sdk.set_user({"id": "..."})`	Identifies the user	User tab, issue filters
`sentry_sdk.set_tag("key", "value")`	Low-cardinality labels for filtering	Tags tab, search bar
`sentry_sdk.set_context("name", {...})`	Rich structured data blocks	Additional Data tab
`sentry_sdk.set_extra("key", value)`	Arbitrary extra data	Additional Data tab
`sentry_sdk.capture_message("msg", "error")`	Manually capture a message as an event	Creates a Sentry issue
`sentry_sdk.capture_exception(exc)`	Manually capture an exception	Creates a Sentry issue

3. Breadcrumbs

Breadcrumbs are a trail of events that led up to the error. They answer the question "what was the service doing in the 10 seconds before the exception?"

Sentry automatically collects breadcrumbs from:

HTTP requests (via the requests/httpx/urllib3 integration)
Database queries (via the SQLAlchemy integration)
logging module calls at WARNING and above

You add custom breadcrumbs for your application logic:

# app/services/document_processor.py
import sentry_sdk

class DocumentProcessor:

    async def process(self, content: bytes, filename: str) -> Document:
        # Add a breadcrumb for each significant step
        sentry_sdk.add_breadcrumb(
            category="document",
            message=f"Processing started: {filename}",
            data={
                "filename": filename,
                "size_bytes": len(content),
            },
            level="info",
        )

        content_type = await self._detect_content_type(content)
        sentry_sdk.add_breadcrumb(
            category="document",
            message="Content type detected",
            data={"content_type": content_type},
            level="info",
        )

        # Before an external API call - if it fails, the breadcrumb shows context
        sentry_sdk.add_breadcrumb(
            category="http",
            message="Calling embedding API",
            data={
                "url": "https://api.openai.com/v1/embeddings",
                "model": "text-embedding-3-small",
                "input_tokens": len(content.decode("utf-8", errors="replace")) // 4,
            },
            level="info",
            type="http",
        )
        embeddings = await self._get_embeddings(content)

        # Before a database write
        sentry_sdk.add_breadcrumb(
            category="db",
            message="Inserting document into database",
            data={"table": "documents", "operation": "INSERT"},
            level="info",
            type="query",
        )
        doc = await self._store(filename, content_type, embeddings)

        sentry_sdk.add_breadcrumb(
            category="document",
            message=f"Processing completed: {doc.id}",
            data={"document_id": doc.id},
            level="info",
        )
        return doc


    async def _run_cache_operation(self, key: str) -> bytes | None:
        """Demonstrate cache miss breadcrumb."""
        cached = await self.cache.get(key)
        sentry_sdk.add_breadcrumb(
            category="cache",
            message=f"Cache {'hit' if cached else 'miss'}",
            data={"key": key, "result": "hit" if cached else "miss"},
            level="debug",
        )
        return cached

What Breadcrumbs Look Like in Sentry

When an error occurs after these breadcrumbs, Sentry shows:

BREADCRUMBS (last 10):
─────────────────────────────────────────────────────
09:14:31.100  [info]    document    Processing started: report.pdf  size=204800
09:14:31.234  [info]    document    Content type detected  content_type=application/pdf
09:14:31.240  [info]    http        GET https://redis:6379/cache_key → MISS
09:14:31.250  [info]    http        Calling embedding API  model=text-embedding-3-small
09:14:31.891  [error]   http        POST https://api.openai.com → 429 Rate Limited
09:14:31.891  [warning] document    Embedding API rate limited - retrying in 5s
09:14:36.900  [info]    http        Calling embedding API (retry 1)
09:14:37.441  [info]    db          Inserting document into database
09:14:37.789  [error]   db          Query failed: deadlock detected
─────────────────────────────────────────────────────
EXCEPTION: DeadlockError at app/services/document_processor.py:87

Without breadcrumbs, you see "DeadlockError" and have to guess why. With breadcrumbs, you immediately see: the OpenAI rate limit caused a 5-second retry delay, which meant the database transaction held locks for longer than usual, causing the deadlock.

4. Custom Error Grouping

Sentry groups errors by default using the stack trace fingerprint. Sometimes this default grouping is wrong:

"Connection pool exhausted" with 50 different stack frames → 50 separate Sentry issues, each with 1 occurrence. Should be 1 issue with 50 occurrences.
Timeout errors from different operations → grouped together even though they need different fixes.

Fingerprinting in `before_send`

def before_send_hook(event: dict, hint: dict) -> dict | None:
    """
    Modify events before they are sent to Sentry.
    Used for: filtering noise, enriching context, custom grouping.
    """
    exc_info = hint.get("exc_info")
    if exc_info is None:
        return event

    exc_type, exc_value, _ = exc_info

    # Custom grouping: all connection pool exhaustion errors together
    if "connection pool" in str(exc_value).lower() and "exhausted" in str(exc_value).lower():
        event["fingerprint"] = ["connection-pool-exhausted"]
        event.setdefault("tags", {})["error.category"] = "infrastructure"
        return event

    # Custom grouping: all validation errors by field name
    if exc_type.__name__ == "ValidationError":
        field_names = sorted(getattr(exc_value, "fields", []))
        event["fingerprint"] = ["validation-error"] + field_names
        return event

    # Drop noise: client disconnection errors are not bugs
    if exc_type.__name__ in ("ConnectionResetError", "BrokenPipeError"):
        return None  # Drop the event entirely

    # Drop noise: rate limit errors from external APIs
    if "rate limit" in str(exc_value).lower() or "429" in str(exc_value):
        return None

    # Drop noise: invalid user input (these are user errors, not service bugs)
    if exc_type.__name__ in ("JSONDecodeError",) and "request" in str(exc_value).lower():
        return None

    return event

`__sentry_grouping_hash__` on Exception Classes

Alternatively, put the grouping logic on the exception class itself:

# app/exceptions.py
class ConnectionPoolExhaustedError(Exception):
    """Raised when the database connection pool is exhausted."""

    @property
    def __sentry_grouping_hash__(self) -> str:
        # All instances of this exception class group together,
        # regardless of which code path triggered them
        return "connection-pool-exhausted"


class DocumentValidationError(Exception):
    """Raised when a document fails validation."""

    def __init__(self, message: str, field: str, value: str):
        super().__init__(message)
        self.field = field
        self.value = value

    @property
    def __sentry_grouping_hash__(self) -> str:
        # Group by field name - different fields are different bugs
        return f"document-validation-error-{self.field}"

5. The before_send Hook

The before_send hook is called for every event before it leaves the process. It is your last chance to:

Filter out noise (return None to drop the event)
Mask PII that should not go to Sentry
Enrich the event with additional context

# app/sentry_hooks.py
import re
from typing import Optional

_EMAIL_RE = re.compile(r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b")
_CARD_RE = re.compile(r"\b(?:\d[ -]?){13,16}\b")
_JWT_RE = re.compile(r"\beyJ[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+\b")

# Exception types that are client errors, not service bugs - drop them
_NOISE_EXCEPTION_NAMES = frozenset({
    "ConnectionResetError",   # Client disconnected
    "BrokenPipeError",        # Client disconnected
    "CancelledError",         # asyncio task cancelled - usually client disconnect
})

# Exception message substrings that indicate noise
_NOISE_PATTERNS = [
    "rate limit",
    "too many requests",
    "connection reset by peer",
    "client disconnected",
]


def _mask_string(s: str) -> str:
    """Mask PII patterns in a string."""
    s = _EMAIL_RE.sub("[EMAIL]", s)
    s = _CARD_RE.sub("[CARD]", s)
    s = _JWT_RE.sub("[JWT]", s)
    return s


def _mask_dict(d: dict) -> dict:
    """Recursively mask PII in a dict (for request body, extra data)."""
    result = {}
    for key, value in d.items():
        if isinstance(value, str):
            result[key] = _mask_string(value)
        elif isinstance(value, dict):
            result[key] = _mask_dict(value)
        elif isinstance(value, list):
            result[key] = [
                _mask_dict(i) if isinstance(i, dict)
                else _mask_string(i) if isinstance(i, str)
                else i
                for i in value
            ]
        else:
            result[key] = value
    return result


def before_send_hook(event: dict, hint: dict) -> Optional[dict]:
    """
    Filter and enrich Sentry events before transmission.

    Returns None to drop the event, or the (modified) event to send it.
    """
    exc_info = hint.get("exc_info")

    # Drop noise exceptions
    if exc_info is not None:
        exc_type, exc_value, _ = exc_info
        if exc_type.__name__ in _NOISE_EXCEPTION_NAMES:
            return None
        exc_str = str(exc_value).lower()
        if any(pattern in exc_str for pattern in _NOISE_PATTERNS):
            return None

    # Mask PII in request body
    request = event.get("request", {})
    if "data" in request:
        if isinstance(request["data"], dict):
            request["data"] = _mask_dict(request["data"])
        elif isinstance(request["data"], str):
            request["data"] = _mask_string(request["data"])

    # Mask PII in extra data
    if "extra" in event:
        event["extra"] = _mask_dict(event["extra"])

    # Remove sensitive headers
    headers = request.get("headers", {})
    for sensitive_header in ["authorization", "cookie", "x-api-key"]:
        if sensitive_header in headers:
            headers[sensitive_header] = "[FILTERED]"

    # Add deployment metadata that might not be set in all environments
    event.setdefault("tags", {}).update({
        "python.version": f"{__import__('sys').version_info.major}.{__import__('sys').version_info.minor}",
    })

    return event


def before_send_transaction_hook(event: dict, hint: dict) -> Optional[dict]:
    """Filter performance transactions (APM spans) before sending."""
    transaction = event.get("transaction", "")

    # Drop health check and metrics transactions from APM
    noisy_paths = ["/health", "/metrics", "/liveness", "/readiness", "/favicon.ico"]
    if any(transaction.endswith(path) for path in noisy_paths):
        return None

    return event

6. Release Tracking

Without release tracking, Sentry cannot tell you which version introduced a regression. With it, you can see:

"This error was first seen in v2.14.0"
"Error rate increased by 300% after deploying v2.15.0"
"This issue was marked RESOLVED IN NEXT RELEASE and has reappeared in v2.16.0"

Setting the Release

import os
import subprocess

def get_git_sha() -> str:
    """Get the current git commit SHA, falling back to an env var."""
    try:
        return subprocess.check_output(
            ["git", "rev-parse", "--short", "HEAD"],
            stderr=subprocess.DEVNULL,
        ).decode().strip()
    except Exception:
        return os.environ.get("GIT_SHA", "unknown")

# In setup_sentry():
release = f"document-api@{get_git_sha()}"

Sentry CLI: Creating Releases

In your CI/CD pipeline, after deploying a new version, create a release in Sentry:

# Install sentry-cli
pip install sentry-cli
# or: curl -sL https://sentry.io/get-cli/ | sh

# Authenticate
export SENTRY_AUTH_TOKEN=your_token
export SENTRY_ORG=your-org
export SENTRY_PROJECT=document-api

GIT_SHA=$(git rev-parse --short HEAD)
RELEASE="document-api@${GIT_SHA}"

# Create the release
sentry-cli releases new "${RELEASE}"

# Associate commits (shows which commits are in this release)
sentry-cli releases set-commits "${RELEASE}" --auto

# Mark the release as deployed to production
sentry-cli releases deploys "${RELEASE}" new \
  --env production \
  --started $(date +%s)

# (For JavaScript/TypeScript frontends: upload source maps here)
# sentry-cli releases files "${RELEASE}" upload-sourcemaps ./dist

Dockerfile Integration

# Dockerfile
FROM python:3.11-slim

# Build arg injected by CI
ARG GIT_SHA=unknown
ENV GIT_SHA=${GIT_SHA}

COPY . /app
WORKDIR /app
RUN pip install -r requirements.txt

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8001"]

# .gitlab-ci.yml
build:
  stage: build
  script:
    - docker build --build-arg GIT_SHA=${CI_COMMIT_SHORT_SHA} -t myapp:${CI_COMMIT_SHORT_SHA} .
    - docker push myapp:${CI_COMMIT_SHORT_SHA}

deploy:
  stage: deploy
  script:
    - kubectl set image deployment/document-api app=myapp:${CI_COMMIT_SHORT_SHA}
    - sentry-cli releases new "document-api@${CI_COMMIT_SHORT_SHA}"
    - sentry-cli releases set-commits "document-api@${CI_COMMIT_SHORT_SHA}" --auto
    - sentry-cli releases deploys "document-api@${CI_COMMIT_SHORT_SHA}" new --env production

7. Performance Monitoring in Sentry

Sentry APM (Application Performance Monitoring) captures transactions (requests) and their child spans. This overlaps with OpenTelemetry tracing from Lesson 03.

When to Use Sentry APM vs OpenTelemetry

Aspect	Sentry APM	OpenTelemetry + Jaeger
Setup complexity	Low - already using Sentry SDK	Medium - separate setup
Error correlation	Native - traces link to errors automatically	Manual - inject trace ID into logs
Cross-service tracing	Yes, if all services use Sentry	Yes, vendor-neutral, any backend
Vendor lock-in	Sentry proprietary	None - OTel standard
Self-hosted option	GlitchTip (limited APM)	Full OTel stack self-hosted
Best for	Single-service or simple architectures	Microservices with diverse tech stacks

Recommendation: Use OpenTelemetry + Jaeger for distributed tracing (Lesson 03), and use Sentry for error tracking with its built-in performance data as supplementary context. Do not try to replace Jaeger with Sentry APM in a microservices environment.

Sentry Transactions (Quick Setup)

import sentry_sdk

# Manual transaction for a background job
with sentry_sdk.start_transaction(
    name="nightly-reindex",
    op="task",
) as transaction:
    transaction.set_tag("job.type", "reindex")

    with sentry_sdk.start_span(op="db.query", description="SELECT documents") as span:
        docs = await db.fetch_all_documents()
        span.set_data("document_count", len(docs))

    with sentry_sdk.start_span(op="index.rebuild", description="Rebuild search index"):
        await search_index.rebuild(docs)

8. Building an Error Workflow

Having Sentry configured is not enough. You need a process that ensures errors are seen, triaged, assigned, and resolved.

Error Workflow: From Alert to Resolution

New Error Detected
       │
       ▼
Sentry creates Issue (groups by fingerprint)
       │
       ▼
Alert fired (Slack / PagerDuty based on rules below)
       │
       ▼
On-call engineer acknowledges in Sentry
       │
       ├── Is this noise? → Configure filter in before_send → IGNORE
       │
       ├── Is this a known issue? → Link to existing ticket → TRACK
       │
       └── Is this a new bug?
               │
               ▼
         Assign to owner (auto-assign by code owners or manually)
               │
               ▼
         Engineer investigates using:
           - Stack trace
           - Breadcrumbs
           - Affected users list
           - Release that introduced it
               │
               ▼
         Fix deployed with release tag
               │
               ▼
         Mark RESOLVED IN NEXT RELEASE
               │
               ▼
         Sentry watches for regression in next release

Sentry Alert Rules

Configure in Sentry UI under Project Settings → Alerts → Issue Alerts:

Alert 1: New Error in Production

Trigger: A new issue is created in environment=production
Action: Notify #backend-alerts Slack channel
        Create PagerDuty incident (severity: warning)

Alert 2: Error Rate Spike

Trigger: Issue X occurs more than 100 times in 1 hour
Action: Notify #incidents Slack channel
        Create PagerDuty incident (severity: critical)

Alert 3: Regression (resolved issue reappears)

Trigger: A resolved issue is seen again in the current release
Action: Notify the issue assignee and #backend-alerts

Alert 4: New users affected

Trigger: Issue affects more than 50 unique users
Action: Notify product manager + #incidents

Ownership Rules

Configure in Sentry Project Settings → Code Owners or Ownership Rules:

# Sentry CODEOWNERS file (similar to GitHub CODEOWNERS)
# Format: path/pattern    team-or-user@sentry

app/api/routes/documents.py         backend-team@yourorg
app/services/classifier.py          ml-team@yourorg
app/services/payments.py            payments-team@yourorg
app/db/                             dba-team@yourorg

# Tag-based ownership
tags.component:payments             payments-team@yourorg
tags.component:classifier           ml-team@yourorg

Error Budget Tracking

Connect error tracking to your SLO (covered in Lesson 05). Your error budget is: if your SLO is 99.9% availability, you have 0.1% budget for errors. Sentry can track the percentage of sessions with errors:

SLO: < 0.1% of sessions encounter an error
Error budget per month: 43.8 minutes of downtime or 0.1% of requests

Current status (from Sentry dashboards):
  Error rate this month: 0.047%
  Error budget remaining: 53%
  Burn rate: 0.94x (healthy - burning at less than 1x)

9. Self-Hosted Alternative: GlitchTip

GlitchTip is an open-source, Sentry-compatible error tracking server. It uses the same Sentry SDK - you only change the DSN to point at your own server.

docker-compose Setup

# docker-compose.yml (GlitchTip)
services:
  glitchtip-db:
    image: postgres:16
    environment:
      POSTGRES_DB: glitchtip
      POSTGRES_USER: glitchtip
      POSTGRES_PASSWORD: glitchtip_password

  glitchtip-redis:
    image: redis:7

  glitchtip:
    image: glitchtip/glitchtip:v4
    ports:
      - "9000:8000"
    environment:
      DATABASE_URL: postgresql://glitchtip:glitchtip_password@glitchtip-db:5432/glitchtip
      REDIS_URL: redis://glitchtip-redis:6379
      SECRET_KEY: your-secret-key-here
      EMAIL_URL: smtp://user:[email protected]:587
      GLITCHTIP_DOMAIN: http://localhost:9000
      DEFAULT_FROM_EMAIL: [email protected]
    depends_on:
      - glitchtip-db
      - glitchtip-redis

  glitchtip-worker:
    image: glitchtip/glitchtip:v4
    command: ./bin/run-celery-with-beat.sh
    environment:
      DATABASE_URL: postgresql://glitchtip:glitchtip_password@glitchtip-db:5432/glitchtip
      REDIS_URL: redis://glitchtip-redis:6379
      SECRET_KEY: your-secret-key-here
    depends_on:
      - glitchtip-db
      - glitchtip-redis

Pointing Your Python App at GlitchTip

sentry_sdk.init(
    dsn="http://your_public_key@localhost:9000/1",  # GlitchTip DSN
    environment="production",
    release=f"document-api@{get_git_sha()}",
    integrations=[FastApiIntegration(), SqlalchemyIntegration()],
    before_send=before_send_hook,
)

Everything else is identical - the SDK does not know it is talking to GlitchTip instead of Sentry.

GlitchTip vs Sentry Comparison

Feature	GlitchTip (self-hosted)	Sentry (cloud)
Cost	Free (hosting costs only)	From $26/mo
Error grouping	Yes	Yes (more advanced ML-based)
Performance APM	Limited	Full
Source maps	Yes	Yes
Release tracking	Yes	Yes
Data sovereignty	Full control	Data in Sentry's cloud
Maintenance burden	You manage upgrades	None
Best for	Teams with data sovereignty requirements	Teams wanting zero maintenance

Complete Error Tracking Setup Checklist

# app/sentry_config.py - complete production setup

import os
import sys
import logging
import sentry_sdk
from sentry_sdk.integrations.fastapi import FastApiIntegration
from sentry_sdk.integrations.starlette import StarletteIntegration
from sentry_sdk.integrations.sqlalchemy import SqlalchemyIntegration
from sentry_sdk.integrations.logging import LoggingIntegration
from app.sentry_hooks import before_send_hook, before_send_transaction_hook, custom_traces_sampler


def setup_sentry() -> None:
    dsn = os.environ.get("SENTRY_DSN")
    if not dsn:
        logging.getLogger(__name__).warning(
            "SENTRY_DSN not set - error tracking disabled"
        )
        return

    environment = os.environ.get("ENVIRONMENT", "development")
    git_sha = os.environ.get("GIT_SHA", "unknown")
    service_name = os.environ.get("SERVICE_NAME", "unknown-service")
    release = f"{service_name}@{git_sha}"

    sentry_sdk.init(
        dsn=dsn,
        environment=environment,
        release=release,
        integrations=[
            StarletteIntegration(transaction_style="endpoint"),
            FastApiIntegration(transaction_style="endpoint"),
            SqlalchemyIntegration(),
            LoggingIntegration(
                level=logging.WARNING,
                event_level=logging.ERROR,
            ),
        ],
        traces_sampler=custom_traces_sampler,
        before_send=before_send_hook,
        before_send_transaction=before_send_transaction_hook,
        send_default_pii=False,
        max_breadcrumbs=50,
        attach_stacktrace=True,          # Attach stack trace to all events, not just exceptions
        in_app_include=["app"],          # Only show frames from 'app' package in stack trace
        max_request_body_size="medium",
    )

    # Tag every event with Python version and runtime info
    with sentry_sdk.configure_scope() as scope:
        scope.set_tag("python.version", f"{sys.version_info.major}.{sys.version_info.minor}.{sys.version_info.micro}")
        scope.set_tag("runtime", "cpython")

Interview Questions and Answers

Q1: Sentry is grouping two completely different bugs - a TimeoutError in the payment service and a TimeoutError in the document OCR service - into the same issue because they have the same exception class. How do you fix this without changing the code that raises the exceptions?

Configure a custom fingerprint in the before_send hook. Inspect the stack trace in the event to determine which module raised the exception, and use that as part of the fingerprint:

def before_send_hook(event, hint):
    if exc_info := hint.get("exc_info"):
        exc_type, _, _ = exc_info
        if exc_type.__name__ == "TimeoutError":
            # Use the module path of the innermost frame as part of the fingerprint
            frames = event.get("exception", {}).get("values", [{}])[-1].get("stacktrace", {}).get("frames", [])
            if frames:
                module = frames[-1].get("module", "unknown")
                event["fingerprint"] = ["TimeoutError", module]
    return event

Alternatively, define separate exception subclasses (PaymentTimeoutError, OCRTimeoutError) and use __sentry_grouping_hash__ on each. This is the cleanest solution because it also improves code clarity.

Q2: You have a multi-tenant SaaS with 10,000 organisations. One organisation is generating 95% of your Sentry errors because their data is triggering an edge case. How do you see this in Sentry, and how do you handle the alerting noise while the bug is being fixed?

In Sentry, open the affected issue and click "Users" tab - it shows a breakdown of affected users and their organisations. You can filter by tag (if you set organisation_id as a tag). To reduce noise while fixing: (1) In the before_send hook, check if organisation_id matches the problematic organisation and add a tag noise=true, then configure a Sentry alert rule to exclude noise=true events. (2) Or use Sentry's "Mute until" feature to suppress alerts on this specific issue for 24 hours while the fix is deployed. (3) The cleanest engineering solution: add a before_send filter that samples this error type at 1% while the organisation tag matches the problematic org, keeping Sentry useful while reducing volume by 99%.

Q3: What is the difference between sentry_sdk.capture_exception(), sentry_sdk.capture_message(), and letting Sentry capture an exception automatically from an unhandled exception?

sentry_sdk.capture_exception(exc) manually sends an exception to Sentry from any point in your code - typically inside a try/except block where you handle the exception locally but still want to track it. The exception is marked as "handled." Unhandled exceptions captured automatically (when they bubble up to the FastAPI integration) are marked as "unhandled" - Sentry gives these higher severity by default. sentry_sdk.capture_message("text", level="error") sends a message without an exception - useful for non-exception alerts like "payment webhook signature validation failed" where you catch the case but want visibility. In general: let unhandled exceptions be captured automatically, use capture_exception for handled errors that you still want to track, and use capture_message sparingly for important non-exception events.

Q4: Your before_send hook is dropping about 40% of all Sentry events as noise. How do you verify that you are not accidentally dropping real bugs?

Several approaches: (1) Log every dropped event with logger.debug("sentry.event.dropped", exc_type=..., reason=...) - this creates a record of what is being filtered without sending it to Sentry. (2) Add a Prometheus counter sentry_events_dropped_total with a reason label - you can alert if the drop rate for a specific reason suddenly changes. (3) Periodically review the before_send code in code review, treating each return None as a policy decision that must be justified with a comment. (4) Run in "shadow mode" for a week in staging: modify before_send to send events to a separate Sentry project instead of dropping them, so you can compare what you're missing. (5) Implement a sampling-based drop: instead of dropping 100% of a noisy error type, drop 99% and let 1% through - this gives you visibility into whether the error type is changing.

Q5: A teammate argues that Sentry and structured logging (Lesson 01) are redundant - "if we have Sentry, why log the error at all?" How do you respond?

They solve different problems and complement each other. Sentry captures exceptions with full context - stack traces, breadcrumbs, user info, local variables - and groups, deduplicates, and alerts on them. It is optimised for the "what broke and why?" question. Structured logging captures a timestamped stream of events for every request, successful or not. It answers "what did the service do at 09:14:32?" - which includes the successful operations that provide context for understanding why the error occurred. Additionally: (1) Logs capture non-exception events (successful requests, cache hits, business events) that Sentry does not. (2) Logs are searchable by arbitrary fields in Loki/Kibana - Sentry search is limited to its own data model. (3) Logs correlate with metrics and traces via request_id and trace_id. (4) Logs are your audit trail; Sentry is your error inbox. You need both.

What You Will Learn​

Prerequisites​

The Incident: Six Weeks, One Error, Zero Resolution​

1. Sentry Python SDK Setup​

Installation​

Basic Initialisation​

Custom Traces Sampler​

FastAPI Integration​

What a Captured Exception Looks Like in Sentry​

2. Enriching Error Context​

Setting User Context​

Programmatic Context in Route Handlers​

Sentry Context Types​

3. Breadcrumbs​

What Breadcrumbs Look Like in Sentry​

4. Custom Error Grouping​

Fingerprinting in before_send​

__sentry_grouping_hash__ on Exception Classes​

5. The before_send Hook​

6. Release Tracking​

Setting the Release​

Sentry CLI: Creating Releases​

Dockerfile Integration​

7. Performance Monitoring in Sentry​

When to Use Sentry APM vs OpenTelemetry​

Sentry Transactions (Quick Setup)​

8. Building an Error Workflow​

Error Workflow: From Alert to Resolution​

Sentry Alert Rules​

Ownership Rules​

Error Budget Tracking​

9. Self-Hosted Alternative: GlitchTip​

docker-compose Setup​

Pointing Your Python App at GlitchTip​

GlitchTip vs Sentry Comparison​

Complete Error Tracking Setup Checklist​

Interview Questions and Answers​